Classification Model Predicting White Wine Quality¶

Introduction¶

Vinho Verde is renowned for its savoury taste, fresh colour and stress-relieving benefits. Among the variants of Vinho Verde, white Vinho Verde stood out as the most promising individual in the global market. A study suggests that the global dry white wine industry surged in 2022 and is expected to maintain an upward trend until 2030 (Market Reports World, 2023). This urging demand in the dry white wine market made quality classification daily more significant; therefore, we have designed a k-nearest-neighbor classification model that determines the quality of the Vinho Verde regarding the wine’s chemical ingredients with reasonable accuracy.

In our project, we will try to answer the question: “How can we predict the level of quality of the White Vinho Verde given the physicochemical attributes in our dataset?”

We utilized the Wine Quality dataset from the UC Irvine Machine Learning Repository, which features 11 physicochemical attributes of wines, such as fixed acidity, citric acid, residual sugar, density, a quality variable, etc.

Most of the variables, besides the “wine quality” variable, are quantitative. Our dataset focuses on the white variant of Vinho Verde, in which most of the variables are measured in grams/dm^3, with the exceptions of free_sulfur_dioxide (milligrams/dm^3), total_sulfur_dioxide (milligrams/dm^3), and pH (represented on a scale from 0 to 14) (Cortez, Cerdeira, Almeida, Matos, & Reis, 2009). Additionally, the dataset contains 4898 observations without any non-applicable values. Our project involves cleaning and preprocessing the Vinho Verde dataset, implementing appropriate algorithms, k-tuning, and k-nearest-neighbor classification models for wine quality predictions on a scale from 1 to 10 with increasing quality evaluation.

In summary, this document provides a thorough list of procedures for our development of an accurate white Vinho Verde wine quality classification model.

Methods & Results¶

In [93]:
install.packages("themis")
install.packages("GGally")
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

In [94]:
# Run This Cell Before Continuing
set.seed(9999) 
library(repr)
library(tidyverse)
library(tidymodels)
library(themis)
library(janitor)
library(cowplot)
library(GGally)

Downloading the data to use during our Analysis.

In [95]:
url <- "https://raw.githubusercontent.com/TrBili/dsci-100-project/main/data_2/winequality-white.csv"
download.file(url, "data/winequality-white.csv")

Extracting the data from the downloaded file

In [96]:
wine_data_raw <- read_csv2("data/winequality-white.csv")

head(wine_data_raw)
ℹ Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.

Rows: 4898 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (6): volatile acidity, citric acid, residual sugar, chlorides, density, ...
dbl (1): quality
num (5): fixed acidity, free sulfur dioxide, total sulfur dioxide, pH, alcohol

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
A tibble: 6 × 12
fixed acidityvolatile aciditycitric acidresidual sugarchloridesfree sulfur dioxidetotal sulfur dioxidedensitypHsulphatesalcoholquality
<dbl><chr><chr><chr><chr><dbl><dbl><chr><dbl><chr><dbl><dbl>
70.270.3620.70.045451701.001 30.45 886
630.3 0.341.6 0.049141320.994 330.49 956
810.280.4 6.9 0.05 30 970.99513260.441016
720.230.328.5 0.058471860.99563190.4 996
720.230.328.5 0.058471860.99563190.4 996
810.280.4 6.9 0.05 30 970.99513260.441016

We can see that some numerical variables have a chr data type; hence, we need to make it numeric to use later in our model. We can also see that the names of variables have spaces; hence, we need to make them suitable for use. Finally, we have to make the quality column, a factor as we will use it as our Class (categorical variable) during this analysis.

We will now clean our data to make it suitable for Exploratory Data Analysis.

In [97]:
wine_data <- wine_data_raw |> 
                clean_names() |>                        
                drop_na() |> # removes rows with NA 
                map_df(as.numeric) |> # as all our columns are numeric
                mutate(quality = as_factor(quality)) # we will use quality as our class
                

head(wine_data)
print("Table 1 : Wine Data")
A tibble: 6 × 12
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensityp_hsulphatesalcoholquality
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct>
70.270.3620.70.045451701.0010 30.45 886
630.300.34 1.60.049141320.9940 330.49 956
810.280.40 6.90.05030 970.99513260.441016
720.230.32 8.50.058471860.99563190.40 996
720.230.32 8.50.058471860.99563190.40 996
810.280.40 6.90.05030 970.99513260.441016
[1] "Table 1 : Wine Data"

All the unique values in the quality column

In [98]:
wine_data |> distinct(quality)
A tibble: 7 × 1
quality
<fct>
6
5
7
8
4
3
9

Using the clean data, we will spit our data into training & testing set, then perform exploratory data analysis.

In [99]:
# Set the seed. Don't remove this!
set.seed(9999) 

wine_split <- initial_split(wine_data, prop=0.75,strata=quality)

## Training Data
wine_train <- training(wine_split)

## Testing Data
wine_test <- testing(wine_split)

head(wine_train)
print("Table 2 : Wine Training Data")
head(wine_test)
print("Table 3 : Wine Testing Data")
A tibble: 6 × 12
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensityp_hsulphatesalcoholquality
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct>
810.270.41 1.450.03311 630.99082990.56 125
860.230.40 4.200.035171090.99473140.53 975
790.180.37 1.200.04016 750.99203180.631085
830.420.6219.250.040411721.00022980.67 975
620.660.48 1.200.02929 750.98923330.391288
760.670.14 1.500.074251680.99373050.51 935
[1] "Table 2 : Wine Training Data"
A tibble: 6 × 12
fixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensityp_hsulphatesalcoholquality
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct>
70.270.3620.70.045451701.0010 30.45 886
650.310.14 7.50.044341330.99553220.50 955
680.260.42 1.70.049411220.99303470.481058
660.270.41 1.30.052161420.99513420.47 106
690.240.35 1.00.052351460.99303450.44 106
850.240.3910.40.044201420.9974 320.53 106
[1] "Table 3 : Wine Testing Data"

We will now be doing Exploratory Data Analysis on our training set.

In [100]:
## Setting the Width & Height of the Plot
options(repr.plot.width=8,repr.plot.height=25)

## Extracting all the column names from our clean Dataset
all_cols <- wine_train |> select(-quality) |> colnames()

## Extracting all the column names from our raw Dataset
col_names <- wine_data_raw |> colnames()

## Creating a list to store all our plots
plots <- list()

## Loop Variable
i <- 0

## Looping through each column
for(c in all_cols) {
    i <- i + 1
    c_sym <- sym(c)
    box_plot <- ggplot(wine_train, aes(x=quality,y=!!c_sym)) +
            geom_boxplot() +
            labs(x="Quality", y=col_names[i], subtitle=paste("Fig 1.",i))
    plots[[c]] <- box_plot
}

## Merging all the plots
plot_grid(plotlist = plots, ncol = 2)
No description has been provided for this image

Our project establishes a classification model for white wine qualities (on a 1 -10 scale) based on the chemical ingredients in the wine. Before applying the k-nearest-neighbor classification engine to our data set, we determined to eliminate variables/columns that had minimal influence on the quality variable in our dataset. To capture these less influential variables, box plots of each variable in the data set have been constructed and displayed above, which provide visual information on the median, range, interquartile ranges, and outliers of the data under each variable. Additionally, we can infer the strength of influence each variable has on the response variable (quality) from the box plots. Variables with similar medians and interquartile ranges across every quality value are considered less influential on the response variable. As a result, we will exclude alcohol, p_h, free_sulfur_dioxide, chlorides, total_sulfure_dioxide, fixed_acidity and density. The process is achievable using tidyverse functions in R.

Observing the boxplot above, we can choose the following attributes.

  1. Volatile Acidity
  2. Citric Acid
  3. Residual Sugar
  4. Sulphates

We will now perform a summary analysis on our selected predictors from our training data, to further distinguish between relevant predictors.

In [101]:
## selecting the required variables
selected_wine_train_data <- wine_train |> 
                    select(quality, volatile_acidity, citric_acid, residual_sugar, sulphates)


## Summary of Training Data - Mean of Each Column & Count of Each Quality
summary_wine_train_data <- wine_train |>
                    group_by(quality) |>
                    summarize(mean_volatile_acidity = mean(volatile_acidity),
                             mean_citric_acid = mean(citric_acid),
                             mean_residual_sugar = mean(residual_sugar),
                             mean_sulphates = mean(sulphates),
                             total_count=n(),
                             percentage=(100*n()/nrow(wine_train)))

summary_wine_train_data
print("Table 4: Summary of Training Data")
A tibble: 7 × 7
qualitymean_volatile_aciditymean_citric_acidmean_residual_sugarmean_sulphatestotal_countpercentage
<fct><dbl><dbl><dbl><dbl><int><dbl>
30.33884620.31230776.3038460.4953846 13 0.3539341
40.37126050.31109244.5621850.4784874 119 3.2398584
50.30266850.33978147.4136610.4832423109829.8938198
60.26095630.33988466.3838490.4901882164744.8407296
70.26560240.32611455.2871990.5022289 66418.0778655
80.28360470.32612405.5937980.4753488 129 3.5121154
90.28666670.38000001.9333330.5033333 3 0.0816771
[1] "Table 4: Summary of Training Data"

The summary table above shows that our selected predictors have variations with quality.

Total Count: The total_count column indicates the number of observations for each quality level. A significant imbalance is evident, with much more data for quality levels 5 and 6 compared to others. This could potentially bias a KNN model, and we might need to consider methods to address this class imbalance, such as upsampling.

Percentage: This column shows the percentage of observations in each quality level relative to the entire dataset. Quality 5 and 6 make up a large percentage of the data, indicating that the dataset is imbalanced, which could influence the KNN classifier's performance.

In [102]:
options(repr.plot.width=7, repr.plot.height=5)

count_plot <- summary_wine_train_data |>
            ggplot(aes(x=quality,y=total_count,fill=quality)) +
            geom_bar(stat="identity") +
            labs(x="Quality",y="Count") +
            ggtitle("Fig - 2: Distribution of Predictor Variables.")

count_plot
No description has been provided for this image

Using the above bar graph, we can easily see that our qualities are not distributed properly.


We will start by creating a recipe that rebalances our dataset by oversampling all the qualities and maintains a 1:1 ratio.

In [103]:
# Set the seed. Don't remove this!
set.seed(9999) 

wine_recipe <- recipe(quality ~ quality + volatile_acidity + citric_acid + residual_sugar + sulphates, data = wine_train) |>
                step_upsample(quality, over_ratio = 1, skip=FALSE) |>
                prep()

upsampled_wine_train <- bake(wine_recipe, wine_train)

summary_wine_train_data2 <- upsampled_wine_train |>
                    group_by(quality) |>
                    summarize(total_count=n(),
                             percentage=(100*n()/nrow(wine_train)))

summary_wine_train_data2
print("Table 5: Summary of Upsampled Training Data")
A tibble: 7 × 3
qualitytotal_countpercentage
<fct><int><dbl>
3164744.84073
4164744.84073
5164744.84073
6164744.84073
7164744.84073
8164744.84073
9164744.84073
[1] "Table 5: Summary of Upsampled Training Data"

We can see in the summary table above, that all our class values are balanced now, we will use this upsampled data to create a recipe which scales all the predictors, and makes it ready for training the model.

In [104]:
wine_recipe_upsampled <- recipe(quality ~ quality + volatile_acidity + citric_acid + residual_sugar + sulphates, data = upsampled_wine_train) |>
                        step_scale(all_predictors()) |>
                        step_center(all_predictors())

In order to obtain the optimal k value for k-nearest-neighbor classification algorithm, we apply cross validation that divides the training data set into 5 validation sets (5-fold cross validation). Having multiple training sets would allows us to acquire a more precise calculation of the accuracy of the classification model, which aids us in finding the best k-neighbor.

In [105]:
# Set the seed. Don't remove this!
set.seed(1234) 

knn_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = tune()) |>
            set_engine("kknn") |>
            set_mode("classification")

k_vals <- tibble(neighbors = seq(from=1,to=10,by=1))

wine_train_vfold <- vfold_cv(upsampled_wine_train, v=5,strata=quality)

vfold_metrics <- workflow() |>
                    add_recipe(wine_recipe_upsampled) |>
                    add_model(knn_spec) |>
                    tune_grid(resamples=wine_train_vfold, grid=k_vals) |>
                    collect_metrics()

accuracies <- vfold_metrics |> filter(.metric=="accuracy")

accuracies
print("Table 6: Accuracy Table of K")
A tibble: 10 × 7
neighbors.metric.estimatormeannstd_err.config
<dbl><chr><chr><dbl><int><dbl><chr>
1accuracymulticlass0.914044750.001501568Preprocessor1_Model01
2accuracymulticlass0.914218350.001460256Preprocessor1_Model02
3accuracymulticlass0.858099250.002459592Preprocessor1_Model03
4accuracymulticlass0.848386750.001979608Preprocessor1_Model04
5accuracymulticlass0.822017450.003123377Preprocessor1_Model05
6accuracymulticlass0.813342050.002121756Preprocessor1_Model06
7accuracymulticlass0.795038650.001175203Preprocessor1_Model07
8accuracymulticlass0.790443850.001072522Preprocessor1_Model08
9accuracymulticlass0.777865550.000909826Preprocessor1_Model09
10accuracymulticlass0.767889650.001576265Preprocessor1_Model10
[1] "Table 6: Accuracy Table of K"

Seeing, the accuracy table, we can already see that K=2, would be the best. We will plot a line graph to visualise this.

In [106]:
options(repr.plot.width=7,repr.plot.height=7)

accuracy_vs_k <- ggplot(accuracies, aes(x=neighbors, y=mean)) +
                    geom_point() +
                    geom_line() +
                    labs(x="Neighbors", y="Accuracy Estimate") +
                    scale_x_continuous(limits=c(1,10), breaks=1:10) +
                    theme(text=element_text(size=12)) +
                    ggtitle("Fig-3: Accuracy Vs K")
accuracy_vs_k
No description has been provided for this image

According to the above accuracy vs k-neighbors line plot (Fig-3), we observe that the curve peaks at k= 2, which provides an indication that our classification model would return the most accurate predictions at k=2. As a result, we retrained the training dataset with a neighbor of k= 2.

In [107]:
# Set the seed. Don't remove this!
set.seed(9999) 

#recreating spec with best K
wine_spec <- nearest_neighbor(weight_func = "rectangular", neighbors = 2) |>
            set_engine("kknn") |>
            set_mode("classification")

#recreating the model with the new recipe
wine_fit <- workflow() |>
            add_recipe(wine_recipe_upsampled) |>
            add_model(wine_spec) |>
            fit(data=wine_test)


#predicting the results of wine_test data
wine_test_predictions <- predict(wine_fit, wine_test) |>
                            bind_cols(wine_test)

head(wine_test_predictions)
print("Table 7: Prediction Table of Testing Data")
A tibble: 6 × 13
.pred_classfixed_acidityvolatile_aciditycitric_acidresidual_sugarchloridesfree_sulfur_dioxidetotal_sulfur_dioxidedensityp_hsulphatesalcoholquality
<fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct>
6 70.270.3620.70.045451701.0010 30.45 886
5650.310.14 7.50.044341330.99553220.50 955
8680.260.42 1.70.049411220.99303470.481058
6660.270.41 1.30.052161420.99513420.47 106
6690.240.35 1.00.052351460.99303450.44 106
6850.240.3910.40.044201420.9974 320.53 106
[1] "Table 7: Prediction Table of Testing Data"

Now we will check the accuracy of the prediction results using metrics and see the table of predicted and correct labels using Confusion Matrix

In [108]:
# filtering the accuracy by comparing the predicted and truth column
wine_test_predictions |> metrics(truth=quality, estimate=.pred_class) |> filter(.metric == "accuracy")
print("Table 8: Accuracy of Prediction of Testing Data")
A tibble: 1 × 3
.metric.estimator.estimate
<chr><chr><dbl>
accuracymulticlass0.9983673
[1] "Table 8: Accuracy of Prediction of Testing Data"
In [109]:
# Creating a confusion matrix to understand the distribution of correct and incorrect labels
wine_confusion <- wine_test_predictions |> conf_mat(truth=quality, estimate=.pred_class)
wine_confusion
print("Table 9: Confusion Matrix")
          Truth
Prediction   3   4   5   6   7   8   9
         3   7   0   0   0   0   0   0
         4   0  44   0   0   0   0   0
         5   0   0 359   1   0   0   0
         6   0   0   0 549   0   0   0
         7   0   0   0   1 216   0   0
         8   0   0   0   0   0  46   0
         9   0   0   0   0   0   0   2
[1] "Table 9: Confusion Matrix"

From both the accuracy metrics and the confusion matrix, we can observe that the majority of the inputs were predicted accurately in terms of wine quality. The incorrect predictions occurred with data that has wine quality equals to 6. In fact, the model demonstrated an accuracy of 99.8%, which is a reflection of the success and usefulness of our wine quality classification model.

We can visualise the relation between our predictors by comparing the predicted and truth value from our wine_test_predictions.

In [110]:
## Setting the Width & Height of the Plot
options(repr.plot.width=15,repr.plot.height=25)

# Function to create scatter plots for variable pairs
create_scatter_plot <- function(data, x_var, y_var, color_var, title_suffix, fig_number) {
  plot <- ggplot(data, aes_string(x = x_var, y = y_var, color = color_var)) +
    geom_point(alpha = 0.5) +
    labs(x = paste("The", x_var, "of the white wine"),
         y = paste("The", y_var, "of the white wine"),
         color = color_var) +
    theme(text = element_text(size = 10)) +
    ggtitle(paste("Fig-", fig_number, ": The wine quality illustrated by", x_var, "vs", y_var, " - ", title_suffix))
  
  return(plot)
}

# Variables to create scatter plots for
variables <- c("volatile_acidity", "citric_acid", "residual_sugar", "sulphates")

# Empty list to store plots
plot_list <- list()
plot_number <- 3 # Starting plot number

# Loop through combinations of variables
for (i in 1:(length(variables) - 1)) {
  for (j in (i + 1):length(variables)) {
    # Create plot with predicted quality
    plot_number <- plot_number + 1
    plot_list[[length(plot_list) + 1]] <- create_scatter_plot(wine_test_predictions, 
                                                              variables[i], variables[j], 
                                                              ".pred_class", "Predicted", plot_number)
    
    # Create plot with actual quality
    plot_number <- plot_number + 1
    plot_list[[length(plot_list) + 1]] <- create_scatter_plot(wine_test_predictions, 
                                                              variables[i], variables[j], 
                                                              "quality", "Actual", plot_number)
  }
}

plot_grid(plotlist = plot_list, ncol = 2)
No description has been provided for this image

In the above visualtion, we can see how all our predictors are related to our truth and predicted value.

In [111]:
options(repr.plot.width=7,repr.plot.height=7)

test_tbl <- wine_test_predictions %>%
    mutate(correct = .pred_class == quality)

true_plot <- ggplot(test_tbl, aes(x=.pred_class,y=quality,fill=correct)) +
                    geom_bar(stat="identity") +
                    labs(x="Predicted Class", y="Actual Class", fill="Prediction Status") +
                    ggtitle("Fig-16: Distribution of Correct & Wrong Predictions")

true_plot
No description has been provided for this image

Finally, this visualisation serves the purpose of showing the distribution of correct and wrong predictions among all our quality labels. Seeing ony two red lines in our bar graph, we can see that our model is extremely accurate as our graph, as the entire graph is dominated by TRUE/correct.

Discussion¶

Summary of Findings:¶

In this project, we documented our procedures for establishing a classification model for predicting wine quality based on the wine’s physicochemical attributes. We performed predictor selection through box plots, implemented 5-fold cross-validation in the decision of the optimal neighbor for the KNN-classification model, trained a classifier based on our pre-obtained training dataset with our optimal k-value of 2, and conducted an accuracy test by inputting testing datasets. The resulting accuracy of 99.8% suggests that our model could be a valuable tool in decision-making processes related to wine quality assessment.

Expectations & Significance:¶

Our objective was to construct a classification model capable of predicting white wine quality with reasonable accuracy. And our results align with the expectations set during the project. The use of physicochemical attributes for wine quality prediction proved to be a viable approach, and the model's accuracy met the anticipated standards. This aligns with the assumption that these attributes play a significant role in determining wine quality.

Our exploration of the dataset and subsequent KNN classification yielded promising results. The model's accuracy indicates its potential usefulness in predicting wine quality. Winemakers and industry stakeholders can leverage these insights to enhance product quality and consumer satisfaction (Mor, Asras, Gal.et al., 2022). The winemakers can adjust the use of the ingredients during the winemaking process to elevate overall wine quality. Meanwhile, by knowing and identifying the factor with minimal impact in the wine-making process, the winemakers can put less effort and money in that and allocate the resources more efficiently, which can save money and time.

Furthermore, our classification model provides a more objective way of judging the quality of a Vinho Verde white wine. Unlike human tasters, the model determines the wine's quality through a precise evaluation of the collected data, enhancing accuracy and eliminating subjective elements in the judgment process. This is because the process of tasting is complex, different people may have completely different criteria for judging the quality of white wine (Barry C Smith, 2019). Human tasters may rely on their personal palate preferences rather than the real and inherent quality of the wine. Therefore, based on the classification model we get, if the human taster gives a totally different result than our classification model, the taster has the opportunity to reassess the white wine, which significantly reduces the probability of erroneous judgments by human experts, increasing the overall reliability and the accuracy of the white wine evaluation process.

Drawbacks & Questions:¶

While an accuracy of 99.8% is significant, it is essential to acknowledge potential limitations and scrutinize the model's performance further. A potential weakness of our classification model is that there exists an imbalance in the distribution of data under each wine quality in our training data; this entails that our model may not be able to produce comparably accurate predictions to these less abundant observations. We have attempted to tackle this problem by using the step_upsample function to achieve a 1:1 ratio for training data selections based on the strata (quality); however, one downside of this approach for our specific dataset is that some of the observations under a specific quality variable are significantly low, which limits the number of selections that we can make for the rest of the observations. Our training dataset lacks observations from quality values of 1, 2, 3, 9, and 10, and this indicates potential inaccuracy if predictions are made on observations with attributes under those 5 wine qualities.

The analysis also leads to further questions, including:

  • What are the some other limitations of the model, and can it be applied to other types of wines, such as red wine (Victor,2020) ?
  • Are there any other factors affecting wine quality that the model has not considered? Factors such as data quality, feature selection, and many others can influence model accuracy.
  • Given that not all available physicochemical attributes were used in the model, do the chosen attributes for the model truly represent the best choice of predictors?
  • Will this model be useful in different regions with different climate conditions? Will these factors influence the quality of wine?

In conclusion, our classification model offers valuable insights into wine quality prediction. While the current accuracy is promising, ongoing exploration will contribute to its long-term applicability across diverse winemaking scenarios.

Citations¶

  1. Market research Reports and Industry Analysis Reports - Market Reports World. (2023). Retrieved from www.marketreportsworld.com website: https://www.marketreportsworld.com/enquiry/request-sample/19646179?utm_source=VarikuLinkdenRD
  2. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547–553. https://doi.org/10.1016/j.dss.2009.05.016
  3. Cortez,Paulo, Cerdeira,A., Almeida,F., Matos,T., and Reis,J.. (2009). Wine Quality. UCI Machine Learning Repository. https://doi.org/10.24432/C56S3T.
  4. Mor, N.S., Asras, T., Gal, E., Demasia, T., Tarab, E., Ezekiel, N., Nikapros, O., Semimufar, O., Gladky, E., Karpenko, M., Sason, D., Maslov, D., Mor, O.(2022). Wine Quality and Type Prediction from Physicochemical Properties Using Neural Networks for Machine Learning: A Free Software for Winemakers and Customers. https://doi.org/10.31222/osf.io/ph4cu
  5. Victor Ivamoto, (2020). Wine Type and Quality Prediction With Machine Learning. https://rstudio-pubs-static.s3.amazonaws.com/565136_b4395e2500ec4c129ab776b9e8dd24de.html
  6. Barry C Smith, (2019). Getting More Out of Wine: wine experts, wine apps and sensory science. https://www.sciencedirect.com/science/article/pii/S2214799319300165#sec0035

In [ ]: